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Preface 



This book has been designed for students and researchers who are working in the 
field of time series analysis and estimation in finite population. There are papers by 
Rajesh Singh, Florentin Smarandache, Shweta Maurya, Ashish K. Singh, Manoj Kr. 
Chaudhary, V. K. Singh, Mukesh Kumar and Sachin Malik. First chapter deals with the 
problem of time series analysis and the rest of four chapters deal with the problems of 
estimation in finite population. 

The book is divided in five chapters as follows: 

Chapter 1. Water pollution is a major global problem. In this chapter, time series analysis 
is carried out to study the effect of certain pollutants on water of Ramgarh Lake of 
Rajasthan, India. 

Chapter 2. In this chapter family of factor- type estimators for estimating population mean 
of stratified population in the presence of non-response has been discussed. Choice of 
appropriate estimator in the family in order to get a desired level of accuracy in presence 
of no-response is worked out. 

Chapter 3. In this chapter our aim is to discuss the existing allocation schemes in 
presence of non-response and to suggest some new allocation schemes utilizing the 
knowledge of response and non-response rates of different strata. 

Chapter 4. In this chapter, we have suggested an improved estimator for estimating the 
population mean in stratified sampling in presence of auxiliary information. 

Chapter 5. In this chapter we have proposed some estimators for the population variance 
of the variable under study, which make use of information regarding the population 
proportion possessing certain attribute. 



The Editors 
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Time Series Analysis of Water Quality 
of Ramgarh Lake of Rajasthan 



Rajesh Singh, Shweta Maurya 
Department of Statistics, Banaras Hindu University 
Varanasi-221005, INDIA 



Ashish K. Singh 

Raj Kumar Goel Institute of Technology, Ghaziabad, India. 
Florentin Smarandache 

Department of Mathematics, University of New Mexico, Gallup, USA 



Abstract 

In this chapter an attempt has been made to study the effect of certain pollutants 
on water of Ramgarh Lake of Rajasthan. Time series analysis of the observed data has 
been done using trend, single exponential smoothing and double exponential smoothing 
methods. 

Keywords: Pollutants, trend, single, double exponential smoothing, time series. 

1. Introduction 

Seventy percent of the earth’s surface is covered by water. Water is undoubtedly 
the most precious natural resource that exists on our planet. Water is an important 
component of the eco-system and any imbalance created either in terms of amount which 
it is represent or impurities added to it, Can harm the whole eco-system. Water pollution 
occurs when a body of water is adversely affected due to the addition of large amount of 
pollutant materials to the water. When it is unfit for its intended use, water is considered 
polluted. There are various sources of water pollution (for detail refer to Jain (2011)) 

Some of the important water quality factors are: 

1) Dissolved oxygen (D.O.) 

2) Biological oxygen demand (B.O.D.) 

3) Nitrate 
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4) Coliform 

5) P.H. 

Chemical analysis of any sample of water gives us a complete picture of its 
physical and chemical constituents. This will give us only certain numerical value but 
for estimating exact quality of water a time series system has been developed known 
as water quality trend, which gives us the idea of whole system for a long time (see 
Jain(201 1)). 

In this chapter we are calculating the trend values for five water parameters of 
Ramgarh lake in Rajasthan for the year 1995-2006 and for three parameters of Mahi river 
for the year 1997-2008. these methods viz. trend analysis, single smoothing are used to 
analyze the data. 

2. Methodology 

After ensuring the presence of trend in the data, smoothing of the data is the next 
requirement for time series analysis. For smoothing the common techniques discussed by 
Gardner(1985) are trend, simple exponential smoothing (SES), double exponential 
smoothing (DES), triple exponential smoothing (TES) and adaptive response rate simple 
exponential smoothing (ARRSES). Jain (2011) used trend method to analyze the data. 
We have extended the work of Jain (2011) and analyzed the data using SES and DES and 
compared these with the help of the available information. The methods are described 
below: 

1. Fitting Of Straight line 

The equation of the straight line is- 

U t =a+b t 

where, 

u t =observed value of the data, a=intercept value, b=slope of the straight line and 
t=time (in years) 

Calculation for a and b: 
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The normal equations for calculating a and b are 
£U t =n a+b£t 
Xt U t =at+bXt**2 

2. Single Exponential Smoothing 

The basic equation of exponential smoothing is 
S t = a y t -i+ (1-a) S t -i ’ 0<a<l 
and parameter a is called the smoothing constant. 

Here, S, stands for smoothed observation or EWNA and y stands for the original 

observation. The subscripts refer to the time periods 1,2,3 ,n. 

The smoothed series starts with the smoothed version of the second observation. 

3. Double Exponential Smoothing 

Single smoothing does not excel in following the data when there is a trend. This 
situation can be improved by the introduction of a second equation with a second 
constant y, which must be chosen in conjunction with a. 

S t =a y t + (1-a) (S t -i+b t -i) , 0<a<l 
b t = y (S t -St-i) + (1- y ) b t -i, 0< y <1. 

For forecasting using single and double exponential smoothing following method is used- 

Forecasting with single exponential smoothing 

S t =ay t -i+(l-a)St-i 0<a<l 

The new forecast is the old one plus an adjustment for the error that occurred in the last 
forecast. 
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Boot strapping of forecasts 



St+i ay or igin+(l-a) St 



This formula works when last data points and no actual observation are available. 

Forecasting with double exponential smoothing 

The one period-ahead forecast is given by: 



Ft+i-St+bt 



The m-periods-ahead forecast is given by: 
F t +m=St+mb t 

( for detail of these methods refer to Gardner (1985)). 

4. Results and Discussion 



Table 1 shows the data on Dissolved Oxygen (D.O.) for Ramgardh Lake for the 

years 1995-2006 and fitted values using trend, single and double exponential smoothing. 

Table 1: Data on Dissolved Oxygen (D.O.) for Ramgardh Lake for the years 1995-2006 
and fitted values using trend, single and double exponential smoothing 





Observed 

Data 


Trend 

values 


Single exponential 
smoothing(alpha=0. 1 ) 


Double exponential 
smoothing(alpha=0.9 
and gamma=0.1) 


1995 


5.12 


5.3372 




5.12 


1996 


5.75 


5.3036 


5.12 


5.707 


1997 


5.26 


5.27 


5.183 


5.32857 


1998 


5.72 


5.2364 


5.1907 


5.698556 


1999 


4.64 


5.2028 


5.24363 


4.765484 


2000 


5.14 


5.1692 


5.183267 


5.110884 


2003 


4.23 


5.1356 


5.17894 


4.329044 


2004 


6.026 


5.102 


5.084046 


5.858346 


2005 


4.46 


5.0684 


5.178242 


4.616965 


2006 


5.52 


5.0348 


5.106417 


5.4327 


Total 


51.866 


51.86 


46.46824 


51.96755 
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For trend, the fitted line is 



U t = 5.1866-0.0169 * t 



with mean squared error (MSE) 0.3077. For single exponential smoothing various values 
of a are tried and minimum MSE = 0.3915 was obtained for a = 0.1. For smoothing of 
the data, Holt’s double exponential smoothing was found to be most appropriate. Various 
combinations of a and y both ranging between 0.1 and 0.9 with increments of 0.1 were 
tried and MSE = 0.0094 was least for a = 0.9 and y = 0. 1 . 

Figure 1- Graph of observed data and fitted values using trend, single and double 
exponential smoothing of Dissolved Oxygen (D.O.) for Ramgardh Lake for 
the years 1995-2006 



Dissolved oxygen 




Year 



Observed Data 



■Trend values 



single exponential 
smoothing(alpha=0.1) 

Double exponential 
smoothing(alpha=0.9 & 
gamma=0.1) 



Adequate dissolved oxygen is necessary for good water quality. Oxygen is a 
necessary element to all forms of life. Natural stream purification processes require 
adequate oxygen levels in order to provide for aerobic life forms. As dissolved oxygen 
levels in water drop below 5.0 mg/1, aquatic life is put under stress ( for details see 
www.state.ky.us). 

Form Table 1 and Figure 1, we observe that level of D.O. in Ramgarh Lake was 
above the required standard 5.0 mg/1, except for the two years 1999 and 2005. 
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Table 2: Comparison of forecasts 



Period 


Observed 

Data 


Forecast(single) 


Forecast(double) 


1 


5.12 




5.32 


2 


5.75 


5.7437 


6.23084 


3 


5.26 


5.25923 


5.77869651 


4 


5.72 


5.714707 


6.510832048 


5 


4.64 


4.6460363 


5.013406419 


6 


5.14 


5.14043267 


5.815207177 


7 


4.23 


4.239489403 


4.32655631 


8 


6.026 


6.016580463 


7.511558059 


9 


4.46 


4.467182416 


4.621632535 


10 


5.52 


5.515864175 


6.686376834 


11 




5.147775731 


5.6266 


12 




5.184998158 


5.7442 


13 




5.218498342 


5.8618 


14 




5.248648508 


5.9794 


15 




5.275783657 


6.097 


16 




5.300205291 


6.2146 


17 




5.322184762 


6.3322 


18 




5.341966286 


6.4498 


19 




5.359769657 


6.5674 


20 




5.375792692 


6.685 



Figure 2: Graph of forecasts 
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Table 3 shows the data on Nitrate for Ramgardh Lake for the years 1995-2006 



and fitted values using trend, single and double exponential smoothing. 

Table 3: Data on Nitrate for Ramgardh Lake for the years 1995-2006 and fitted values 
using trend, single and double exponential smoothing 



Year(x) 


Observed 

Data 


Trend 

values 


Single exponential 
smoothing(alpha=0. 1 ) 


Double exponential 
smoothing(alpha=0.9 
and gamma=0.1) 


1995 


0.32 


0.588 




0.32 


1996 


1.28 


0.546 


0.32 


1.176 


1997 


0.38 


0.504 


0.416 


0.46096 


1998 


0.08 


0.462 


0.4124 


0.104883 


1999 


0.04 


0.42 


0.37916 


0.028797 


2000 


0.92 


0.378 


0.345244 


0.815205 


2003 


0.24 


0.336 


0.40272 


0.300708 


2004 


0.25 


0.294 


0.386448 


0.247331 


2005 


0.186 


0.252 


0.372803 


0.184874 


2006 


0.3 


0.21 


0.354123 


0.281431 


Total 


3.996 


3.99 


3.388897 


3.920189 



For trend, the fitted line is 
U t = 0.3996 - 0.0216 *t 

with MSE= 0.3078. For single exponential smoothing various values of a are tried and 
minimum MSE = 0.3412 was obtained for a = 0.1. For smoothing of the data, Holt’s 
double exponential smoothing was found to be most appropriate. Various combinations 
of a and y both ranging between 0. 1 and 0.9 with increments of 0. 1 were tried and MSE = 
0.0033 was least for a = 0.9 and y = 0.1. 

Nitrites can produce a serious condition in fish called "brown blood disease." 
Nitrites also react directly with hemoglobin in human blood and other wann-blooded 
animals to produce methemoglobin. Methemoglobin destroys the ability of red blood 
cells to transport oxygen. This condition is especially serious in babies under three 
months of age. It causes a condition known as methemoglobinemia or "blue baby" 
disease. Water with nitrite levels exceeding 1.0 mg/1 should not be used for feeding 
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babies. Nitrite/nitrogen levels below 90 mg/1 and nitrate levels below 0.5 mg/1 seem to 
have no effect on warm water fish (for details see www.state.ky.us). 

Form Table 3 and Figure 3, we observe that level of Nitrate in Ramgarh Lake 
was below the standard 1.0 mg/1, except for the year 1996. 



Figure 3: Graph of observed data and fitted values using trend, single and double 
exponential smoothing of Nitrate for Ramgardh lake for the years 1995-2006 



Nitrate 




year 



Observed Data 

■Trend values 

single exponential 
smoothing(alpha=0.1) 

‘Double exponential 
smoothing(alpha=0.9 & 
gamma=0.1) 



Nitrites can produce a serious condition in fish called "brown blood disease." 
Nitrites also react directly with hemoglobin in human blood and other warm-blooded 
animals to produce methemoglobin. Methemoglobin destroys the ability of red blood 
cells to transport oxygen. This condition is especially serious in babies under three 
months of age. It causes a condition known as methemoglobinemia or "blue baby" 
disease. Water with nitrite levels exceeding 1.0 mg/1 should not be used for feeding 
babies. Nitrite/nitrogen levels below 90 mg/1 and nitrate levels below 0.5 mg/1 seem to 
have no effect on warm water fish ( for details see www.state.ky.us). 

Form Table 3 and Figure 3, we observe that level of Nitrate in Ramgarh Lake 
was below the standard 1.0 mg/1, except for the year 1996. 



12 



Table 4: Comparison of forecasts 



Period 


Observed 

Data 


Forecast(single) 


Forecast(double) 


1 


0.32 


0.416 


0.24 


2 


1.28 


0.4124 


1.1896 


3 


0.38 


0.37916 


0.328832 


4 


0.08 


0.345244 


-0.07203 


5 


0.04 


0.40272 


-0.12795 


6 


0.92 


0.386448 


0.847085 


7 


0.24 


0.372803 


0.223314 


8 


0.25 


0.354123 


0.17474 


9 


0.186 


0.34871 


0.114309 


10 


0.3 


0.348711 


0.244291 


11 




0.34384 


0.24426 


12 




0.339456 


0.20712 


13 




0.33551 


0.16998 


14 




0.331959 


0.13284 


15 




0.328763 


0.0957 


16 




0.325887 


0.05856 


17 




0.323298 


0.02142 


18 




0.320968 


-0.01572 


19 




0.318872 


-0.05286 


20 




0.316984 


-0.09 



Figure 4 : Graph of forecasts 



Comparison of forecasts 




Observed Data B Forecast(single) A Forecast(double) 
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Table 5 shows the data on B.O.D. for Ramgardh Lake for the years 1995-2006 



and fitted values using trend, single and double exponential smoothing. 



Table 5: Data on Biological oxygen demand (B.O.D.) for Ramgardh Lake for the years 
1995-2006 and fitted values using trend, single and double exponential 
smoothing 



Year(x) 


Observed 

Data 


Trend 

values 


single exponential 
smoothing(alpha=0. 1 ) 


Double exponential 
smoothing(alpha=0.9 
& gamma=0.1) 


1995 


1.96 


5.122 




1.96 


1996 


14.09 


4.762 


1.96 


12.87167 


1997 




4.402 


3.173 


3.047483 


1998 


1.8 


4.042 


3.0397 


1.920392 


1999 




3.682 


2.91573 


1.490847 


2000 


2.78 


3.322 


2.770157 


2.633116 


2003 


2.76 


2.962 


2.771141 


2.742563 


2004 


1.976 


2.602 


2.770027 


2.049477 


2005 


2.78 


2.242 


2.690624 


2.697155 


2006 


3.58 


1.882 


2.699562 


3.489379 


Total 


35.026 


35.02 


24.78994 


34.90208 



Biochemical oxygen demand is a measure of the quantity of oxygen used by 
microorganisms (e.g., aerobic bacteria) in the oxidation of organic matter. Natural 
sources of organic matter include plant decay and leaf fall. However, plant growth and 
decay may be unnaturally accelerated when nutrients and sunlight are overly abundant 
due to human influence. Urban runoff carries pet wastes from streets and sidewalks; 
nutrients from lawn fertilizers; leaves, grass clippings, and chapter from residential areas, 
which increase oxygen demand. Oxygen consumed in the decomposition process robs 
other aquatic organisms of the oxygen they need to live. Organisms that are more tolerant 
of lower dissolved oxygen levels may replace a diversity of natural water systems contain 
bacteria, which need oxygen (aerobic) to survive. Most of them feed on dead algae and 
other dead organisms and are part of the decomposition cycle. Algae and other producers 
in the water take up inorganic nutrients and use them in the process of building up their 
organic tissues (for details refer to www.freedrinkingwater.com). 
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For trend, the fitted line is 



U t = 3.5026+0.18094 *t 

with MSE= 1 1.74366. For single exponential smoothing various values of a are tried and 
minimum MSE = 17.1093 was obtained for a = 0.1. For smoothing of the data, Fiolt’s 
double exponential smoothing was found to be most appropriate. Various combinations 
of a and y both ranging between 0.1 and 0.9 with increments of 0.1 were tried and MSE = 
0.3000 was least for a = 0.9 and y = 0.1. 



Figure 5: Graph of observed data and fitted values using trend, single and double 

exponential smoothing of B.O.D. for Ramgardh Lake for the years 1995-2006 



Biological oxygen demand 




1995 1996 1997 1998 1999 2000 2003 2004 2005 2006 

Year 



Observed Data 



■Trend values 



single exponential 
smoothing(alpha=0 
• 1 ) 

Double exponential 
smoothing(alpha=0 
.9 & gamma=0.1) 
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Table 6: Comparison of forecasts 



Period 


Observed 

Data 


Forecast(single) 


Forecast(double) 


1 


1.96 




1.906667 


2 


14.09 


3.173 


13.91483 


3 


1.84 


3.0397 


3.003915 


4 


1.8 


2.91573 


1.768471 


5 


1.46 


2.770157 


1.311164 


6 


2.78 


2.771141 


2.585629 


7 


2.76 


2.770027 


2.710768 


8 


1.976 


2.690624 


1.951553 


9 


2.78 


2.699562 


2.673792 


10 


3.58 


2.787606 


3.547575 


11 




2.7871 


3.5474 


12 




2.86639 


3.6055 


13 




2.937751 


3.6636 


14 




3.001976 


3.7217 


15 




3.059778 


3.7798 


16 




3.1118 


3.8379 


17 




3.15862 


3.896 


18 




3.200758 


3.9541 


19 




3.238683 


4.0122 


20 




3.272814 


4.0703 



Figure 6 : Graph of forecasts 
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Table 7 shows the data on Total Colifirm for Ramgardh Lake for the years 



1995-2006 and fitted values using trend, single and double exponential smoothing. 



Table 7 : Data on Total Colifirm for Ramgardh lake for the years 1995-2006 and fitted 
values using trend, single and double exponential smoothing 



Year(x) 


Observed 

Data 


Trend 

values 


single exponential 
smoothing(alpha=0.6) 


Double exponential 
smoothing(alpha=0.9 
& gamma=0.1) 


1995 


1169 


687.894 




1169 


1996 


285 


615.314 


1169 


339.2333 


1997 


840.75 


542.734 


638.6 


751.5507 


1998 


144 


470.154 


759.89 


173.7353 


1999 


65.33 


397.574 


390.356 


42.47463 


2000 


121.5 


324.994 


195.3404 


81.95854 


2003 


119 


252.414 


151.0362 


87.21566 


2004 


720.6 


179.834 


131.8145 


632.042 


2005 


86 


107.254 


485.0858 


123.3548 


2006 


61.66 


34.674 


245.6343 


47.21817 


Total 


3612.84 


3612.84 


4166.757 


3447.783 



Total coliform bacteria are a collection of relatively harmless microorganisms that 
live in large numbers in the intestines of man and warm- and cold-blooded animals. They 
aid in the digestion of food. A specific subgroup of this collection is the fecal coliform 
bacteria, the most common member being Escherichia coli. These organisms may be 
separated from the total coliform group by their ability to grow at elevated temperatures 
and are associated only with the fecal material of warm-blooded animals. The presence of 
fecal coliform bacteria in aquatic environments indicates that the water has been 
contaminated with the fecal material of man or other animals. The presence of fecal 
contamination is an indicator that a potential health risk exists for individuals exposed to 
this water (for details see www.state.ky.us) . 
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For trend, the fitted line is 
U t = 361.284- 36.2989* t 

with MSE= 99896.33. For single exponential smoothing various values of a are tried and 
minimum MSE = 205949.6 was obtained for a = 0.6. For smoothing of the data, Holt’s 
double exponential smoothing was found to be most appropriate. Various combinations 
of a and y both ranging between 0.1 and 0.9 with increments of 0.1 were tried and 
MSE = 2432.458 was least for a = 0.9 and y = 0.1. 

Figure 7: Graph of observed data and fitted values using trend, single and double 

exponential smoothing of Total Coliform for Ramgardh Lake for the years 
1995-2006. 



Total coliform 




1995 1996 1997 1998 1999 20002003 20042005 2006 



Year 



Observed Data 



-Trend values 



single exponential 
smoothing(alpha=0.6) 



Double exponential 
smoothing(alpha=0.9 & 
gamma=0.1) 
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Table 8: Comparison of forecasts 



Period 


Observed 

Data 


Forecast(single) 


Forecast(double) 


1 


1169 




827.3333 


2 


285 


638.6 


- 51.2433 


3 


840.75 


759.89 


441.3534 


4 


144 


390.356 


- 163.224 


5 


65.33 


195.3404 


- 273.915 


6 


121.5 


151.0362 


- 198.843 


7 


119 


131.8145 


- 164.98 


8 


720.6 


485.0858 


459.5482 


9 


86 


245.6343 


- 82.7583 


10 


61.66 


135.2497 


- 145.897 


11 




135.248 


- 145.897 


12 




91.0952 


- 339.012 


13 




73.43408 


- 532.127 


14 




66.36963 


- 725.242 


15 




63.54385 


- 918.357 


16 




62.41354 


- 1111.47 


17 




61.96142 


- 1304.59 


18 




61.78057 


- 1497.7 


19 




61.70823 


- 1690.82 


20 




61.67929 


- 1883.93 



Figure 8: Comparison of forecasts 
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Table 9 shows the data on pH for Ramgardh Lake for the years 1995-2006 and fitted 
values using trend, single and double exponential smoothing. 

Table 9: Data on pH for Ramgardh Lake for the years 1995-2006 and fitted values using 
trend, single and double exponential smoothing 



Year(x) 


Observed 

Data 


Trend 

values 


single exponential 
smoothing(alpha=0.2) 


Double exponential 
smoothing(alpha=0.9 
& gamma=0.1) 


1995 


7.64 


6.29 




7.64 


1996 


7.48 


6.65 


7.64 


7.509667 


1997 


7.87 


7.01 


7.608 


7.844963 


1998 


8.05 


7.37 


7.6604 


8.042746 


1999 


8.44 


7.73 


7.73832 


8.414177 


2000 


8.3 


8.09 


7.878656 


8.327645 


2003 


7.25 


8.45 


7.962925 


7.371503 


2004 


8.28 


8.81 


7.82034 


8.191954 


2005 


7.87 


9.17 


7.912272 


7.912923 


2006 


8.006 


9.53 


7.903817 


8.003557 


Total 


79.186 


79.1 


70.12473 


79.25914 



pH is a measure of the acidic or basic (alkaline) nature of a solution. The 
concentration of the hydrogen ion [H+] activity in a solution determines the pH. A pH 
range of 6.0 to 9.0 appears to provide protection for the life of freshwater fish and bottom 
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dwelling invertebrates. The most significant environmental impact of pH involves 
synergistic effects. Synergy involves the combination of two or more substances which 
produce effects greater than their sum. This process is important in surface waters. 
Runoff from agricultural, domestic, and industrial areas may contain iron, aluminum, 
ammonia, mercury or other elements. The pH of the water will determine the toxic 
effects, if any, of these substances. For example, 4 mg/1 of iron would not present a toxic 
effect at a pH of 4.8. However, as little as 0.9 mg/1 of iron at a pH of 5.5 can cause fish to 
die (for details see www.state.ky.us) . 

For trend, the fitted line is 
U t = 7.9186 + 0.1845 *t 

with MSE = 0.9995. For single exponential smoothing various values of a are tried and 
minimum MSE = 0.1831 was obtained for a = 0.2. For smoothing of the data, Holt’s 
double exponential smoothing was found to be most appropriate. Various combinations 
of a and y both ranging between 0.1 and 0.9 with increments of 0.1 were tried and MSE = 
0.002735was least for a = 0.9 and y = 0. 1 . 

Figure 9: Graph of observed data and fitted values using trend, single and double 

exponential smoothing of pH for Ramgardh Lake for the years 1995-2006. 
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Table 10: Comparison of forecasts 



Period 


Observed 

Data 


Forecast(single) 


Forecast(double) 


1 


7.64 




7.776667 


2 


7.48 


6.732 


7.619633 


3 


7.87 


7.083 


7.977463 


4 


8.05 


7.245 


8.181774 


5 


8.44 


7.596 


8.576446 


6 


8.3 


7.47 


8.465033 


7 


7.25 


6.525 


7.399539 


8 


8.28 


7.452 


8.299231 


9 


7.87 


7.083 


7.981569 


10 


8.006 


7.2054 


8.074402 


11 




7.8368 


8.074395 


12 




7.85372 


8.14524 


13 




7.868948 


8.216085 


14 




7.882653 


8.28693 


15 




7.894988 


8.357775 


16 




7.906089 


8.42862 


17 




7.91608 


8.499465 


18 




7.925072 


8.57031 


19 




7.933165 


8.641155 


20 




7.940448 


8.712 



Figure 10: Graph of forecasts 
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We observe from the calculations for the different parameters of pollutants that 
double smoothing follows the data much closer than single smoothing. Furthermore, for 
forecasting single smoothing cannot do better than projecting a straight horizontal line, 
which is not very likely to occur in reality. So for forecasting purposes for our data 
double exponential smoothing is more preferable. 

Conclusion 

From the above discussions, we observe that the various pollutants considered in 
the article may have very hazardous effect on quality of water. Increase of pollutants in 
water beyond a certain limit may be dangerous for aquatic animals. Also, according to 
recent reports, most of the tap and well water in the India are not safe for drinking due to 
presence of various pollutants in inappropriate percentage. Now, we have reached the 
point where all sources of our drinking water, including municipal water systems, wells, 
lakes, rivers, and even glaciers, contain some level of contamination. So, we need to 
keep a routine check of the quality of water so that we can lead a healthy life. 
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Abstract 

The present chapter deals with the study of general family of factor-type 
estimators for estimating population mean of stratified population in the presence of non- 
response whenever information on an auxiliary variable are available. The proposed 
family includes separate ratio, product, dual to ratio and usual sample mean estimators as 
its particular cases and exhibits some nice properties as regards to locate the optimum 
estimator belonging to the family. Choice of appropriate estimator in the family in order 
to get a desired level of accuracy even if non-response is high, is also discussed. The 
empirical study has been carried out in support of the results. 

Keywords: Factor-type estimators, Stratified population, Non-response, Optimum 
estimator, Empirical study. 

1. Introduction 

In sampling theory the use of suitable auxiliary information results in 
considerable reduction in variance of the estimator. For this reason, many authors used 
the auxiliary information at the estimation stage. Cochran (1940) was the first who used 
the auxiliary information at the estimation stage in estimating the population parameters. 
He proposed the ratio estimator to estimate the population mean or total of a character 
under study. Hansen et. al. (1953) suggested the difference estimator which was 
subsequently modified to give the linear regression estimator for the population mean or 
its total. Murthy (1964) have studied the product estimator to estimate the population 
mean or total when the character under study and the auxiliary character are negatively 
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correlated. These estimators can be used more efficiently than the mean per unit 
estimator. 

There are several authors who have suggested estimators using some kn own 
population parameters of an auxiliary variable. Upadhyaya and Singh (1999) have 
suggested the class of estimators in simple random sampling. Kadilar and Cingi (2003) 
and Shabbir and Gupta (2005) extended these estimators for the stratified random 
sampling. Singh et. al. (2008) suggested class of estimators using power transformation 
based on the estimators developed by Kadilar and Cingi (2003). Kadilar and Cingi (2005) 
and Shabbir and Gupta (2006) have suggested new ratio estimators in stratified sampling 
to improve the efficiency of the estimators. Koyuncu and Kadilar (2008) have proposed 
families of estimators for estimating population mean in stratified random sampling by 
considering the estimators proposed in Searls (1964) and Khoshnevisan et. al. (2007). 
Singh and Vishwakarma (2008) have suggested a family of estimators using 
transfonnation in the stratified random sampling. Recently, Koyuncu and Kadilar (2009) 
have proposed a general family of estimators, which uses the information of two auxiliary 
variables in the stratified random sampling to estimate the population mean of the 
variable under study. 

The works which have been mentioned above are based on the assumption that 
both the study and auxiliary variables are free from any kind of non-sampling error. But, 
in practice, however the problem of non-response often arises in sample surveys. In such 
situations while single survey variable is under investigation, the problem of estimating 
population mean using sub-sampling scheme was first considered by Hansen and Hurwitz 

(1946). If we have incomplete information on study variable X 0 and complete 

information on auxiliary variable X x , in other words if the study variable is affected by 
non-response error but the auxiliary variable is free from non-response. Then utilizing the 
Hansen-Hurwitz (1946) technique of sub-sampling of the non-respondents, the 
conventional ratio and product estimators in the presence of non-response are 
respectively given by 




and 



( 1 . 2 ) 




JCl 




The purpose of the present chapter is to suggest separate -type estimators in 
stratified population for estimating population mean using the concept of sub-sampling of 
non-respondents in the presence of non-response in study variable in the population. In 
this context, the information on an auxiliary characteristic closely related to the study 
variable, has been utilized assuming that it is free from non-response. 

In order to suggest separate-type estimators, we have made use of Factor-Type 
Estimators (FTE) proposed by Singh and Shukla (1987). FTE define a class of estimators 
involving usual sample mean estimator, usual ratio and product estimators and some 
other estimators existing in literature. This class of estimators exhibits some nice 
properties which have been discussed in subsequent sections. 

2. Sampling Strategy and Estimation Procedure 

Let us consider a population consisting of N units divided into k strata. Let the 
size of i th stratum is Af , (i = 1,2 and we decide to select a sample of size nfrom the 
entire population in such a way that n i units are selected from the i th stratum. Thus, we 

k 

have ^ n j = n . Let the non-response occurs in each stratum. Then using Hansen and 

i = i 

Hurwitz (1946) procedure we select a sample of size m, units out of n i2 non-respondent 

units in the i th stratum with the help of simple random sampling without replacement 
(SRSWOR) scheme such that n i2 = L i m j , L j > 1 and the infonnation are observed on all 

the m i units by interview method. 

The Hansen-Hurwitz estimator of population mean X o ; of study variable X 0 for 
the i th stratum will be 




Wjj X 0/1 W i2 ^ 0/m 

n i 



(i = l,2,..,k) 



( 2 . 1 ) 
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where xo a and xo mi are the sample means based on n n respondent units and m i non- 
respondent units respectively in the i th stratum for the study variable. 

Obviously T* Vi is an unbiased estimator of X o, . Combining the estimators over all 
the strata we get the estimator of population mean Xo of study variable X 0 , given by 

TL=f 1 P<Tm ( 2 - 2 ) 

i = 1 

where p,. = — . 

1 N 

which is an unbiased estimator ofXo. Now, we define the estimator of population mean 
X i of auxiliary variable X x as 

T ut=±P, x » ( 2 - 3 ) 

/= 1 

where xu is the sample mean based on n t units in the i‘ h stratum for the auxiliary variable. 
It can easily be seen that T Ut is an unbiased estimator ofXi because xu gives unbiased 
estimates of the population mean X i, of auxiliary variable for the i th stratum. 

3. Suggested Family of Estimators 

Let us now consider the situation in which the study variable is subjected to non- 
response and the auxiliary variable is free from non-response. Motivated by Singh and 

Shukla (1987), we define the separate-type family of estimators of population mean Xo 
using factor-type estimators as 

r ra («)=EftL(«) (3-D 

i= 1 
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where 



T‘M= K 



(3.2) 



(A + C)X i i + fBxu 
(A + fB )2f u + Cxu 



and / = — , A - (a-l)(a-2), B - (a - l)(a - 4) , C = (a - 2)(a - 3)(a - 4) ; a> 0. 
N 

3.1 Particular Cases of T FS ( a ) 

Case-1: If a = 1 then A = B = 0 , C = -6 



so that T* i (\)=r* i ^- 



Xu 



h . 

and hence T FS (l) = ^ p.T' 0 



Xu 



;= i 



(3-3) 



Thus, T fs (l) is the usual separate ratio estimator under non-response. 
Case-2: If a = 2 then A = 0 = C , B -- 2 



so that T; i (2) = r 0i ^- 
X i/ 



/i 

and hence T fS (2) = ^P, K 



i = 1 



Xl,' 

Xu 



(3-4) 



which is the usual separate product estimator under non-response. 



Case-3: If a = 3 then ,4 = 2 ,B = - 2, C = 0 



so that T'(3) = T' 



Xu - f Xu 

(1 ~f}Xu 



and hence T FS ( 3) = ^ A r fi( 3 ) 



(3.5) 




which is the separate dual to ratio-type estimator under non-response. The dual to ratio 
type estimator was proposed by Srivenkataramana (1980). 



Case-4: If a = 4 then A = 6 ,B = 0 , C = 0 
so that T* Fi (4) = 7)* 



and hence T FS (4) = YjPiK = T L (3-6) 

i = 1 

which is usual mean estimator defined in stratified population under non-response. 

3.2 Properties of T FS (a) 

Using large sample approximation, the bias of the estimator T FS (a), up to the first 
order of approximation was obtained following Singh and Shukla (1987) as 

s[r H (o)]=£[r fs (o)-J„] 



MYjPiX c 



1=1 



V n i N iJ 



c 



A+fB + C 



Cu-p 0U C 0i C u 



(3.7) 



C — fB S S 

where </>{a) = - — , C 0i =^J L , C u ==^, S* and S',( are the population 

A + JD + O yC 0 / y(- 1 i 



mean squares of study and auxiliary variables respectively in the i th stratum . p 0li is the 
population correlation coefficient between X fl and X x in the i th stratum. The Mean 
Square Error (MSE) up to the first order of approximation was derived as 



M [t fs (a) =E [t fs (a) - X 0 _ ” 
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= t,p?x 



i = 1 



V M + f(a) 

X o; X 1/ 



-2 
Xo; 



-2 
Xu 



Since 



find 



X.X 

v n, X, y 



, L -1 

S 2 



^2^2, f(* U ) = 



K n i N iJ 



and 



Cov(r 0 *,x u) = 



K n t N U 



Poii^Oi^li 



[ due to Singh (1998)]. 



where S^ 2 is the population mean square of the non-response group in the i stratum 
and W i2 is the non-response rate of the i th stratum in the population. 



Therefore, we have 



Z=1 



m[t fs {cc)]=Y\ — -- 1 - \pf [sl+(p{a) 2 R 2 mi S 2 u - 2(p{a)R mi p m S 0i S u 



V n i N tJ 



+i,—w l2 p'-s i 2 

1=1 n , 



(3.8) 



where 



R - X °‘ 



3.3 Optimum Choice of a 

In order to obtain minimum MSE of T FS (a), we differentiate the MSE with 
respect to cc and equate the derivative to zero 



i=i 



Z pf[2<Pia)^a)R 2 0 X-^i^uP 0 uS 0i Su] = 0: 



V n i N u 



(3.9) 



where </>' (a) stands for first derivative of 0(a) .From the above expression, we have 
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(»(«)= 



z 

i — 1 



l _ l 

\ n i N tJ 



Pi ^OliPoii^Oi^li 



li 

z 

i = 1 



l _ l 

v»/ 



= V (say). 



2 n2 ci 2 

Pi R ou S u 



(3.10) 



It is easy to observe that </>(a) is a cubic equation in the parameter a . Therefore, 
the equation (3.10) will have at the most three real roots at which the MSE of the 
estimator T FS (a) attains its minimum. 

Let the equation (3.10) yields solutions as a 0 , a x and a 2 such that M \t fs ( a)\ is 
same. A criterion of making a choice between a 0 , a x and a 2 is that “compute the bias of 
the estimator at a = a 0 , a x and a 2 and select a t at which bias is the least”. This is a 
novel property of the FTE. 

3.4 Reducing MSE through Appropriate Choice of a 

By using FTE for defining the separate-type estimators in this chapter, we have an 
advantage in terms of the reduction of the value of MSE of the estimator to a desired 
extent by an appropriate choice of the parameter a even if the non-response rate is high 
in the population. The procedure is described below: 

Since MSE’s of the proposed strategies are functions of the unknown parameter 
a as well as functions of non-response rates W j2 , it is obvious that if a is taken to be 

constant, MSE’s increase with increasing non-response rate, if other characteristics of the 
population remain unchanged, along with the ratio to be sub sampled in the non-response 
class, that is, L x . It is also true that more the non-response rate, greater would be the size 

of the non-response group in the sample and, therefore, in order to lowering down the 
MSE of the estimator, the size of sub sampled units should be increased so as to keep the 
value of f in the vicinity of 1; but this would, in term, cost more because more effort 
and money would be required to obtain information on sub sampled units through 
personal interview method. Thus, increasing the size of the sub sampled units in order to 
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reduce the MSE is not a feasible solution if non-response rate is supposed to be large 
enough. 

The classical estimators such as T 0HH , T * R , T * p , discussed earlier in literature in 
presence of non-response are not helpful in the reduction of MSE to a desired level. In all 
these estimators, the only controlling factor for lowering down the MSE isZ ( . , if one 
desires so. 



By utilizing FTE in order to propose separate- type estimators in the present work, 
we are able to control the precision of the estimator to a desired level only by making an 
appropriate choice of cc . 

Let the non-response rate and mean-square of the non-response group in the i th 

n . 7 

stratum at a time be W j2 = — — and S 0i2 respectively. Then, for a choice of a = a {) , the 



MSE of the estimator would be 



M[T rs (a)lW n \=j^ 



;=i 



1 1 



\ n i 



N, 



h fc + 0(«o ) 2 R ou S u ~ M a o K,PouS 0l Su . 



ij 



+t—w P y:si n 

i=\ 



(3.11) 



Let us now suppose that the non-response rate increased over time and it is 

, N' 

W i2 = such that /V, 2 > N n . Obviously, with change in non-response rate, only the 



parameter Sf )i2 will change. Let it becomes Sq I2 . Then we have 



M[T FS (a)/w; 2 ]=ii 



i= 1 



\ 1_ 

n : N. 



P< 



f to/ + Mx ) 2 ~ 2^i ) R 



OliPoU^Oi^li . 



i J 




1=1 



L-\ 




(3.12) 
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Clearly, if a„ - a x and S'l i2 > S^ i2 then M [r ra J > M [r ra (a)\W l2 ] . Therefore, 
we have to select a suitable value a ] , such that even if W' i2 > W i2 and S 0 2 i2 > Sf )i2 , 
expression (3.12) becomes equal to equation (3.11) that is, the MSE of T FS {a) is reduced 
to a desired level given by (3.1 1). Equating (3.1 1) to (3.12) and solving for^(ctj), we get 



^(«i) 2 Z 




\ n i 



J 

N 



ij 



pfR 2 0U S 2 - 2*fa)Z 




V n i 



N, 



Pi ^OliPoii^Oi^h 



z 

1=1 



1 _ 1 

n, N, 



\ i 



: {K«o f R L S » - o K,P m S«S„} 



+ f J — pMXa~<Sm) 

i=i n t 



= 0, 



(3.13) 



which is quadratic equation in (f)(a l ) . On solving the above equation, the roots are 
obtained as 






n. 

I 

1=1 



l 1_ 

V n i N iJ 



Pi ^OliAui^oA 



- + 



z 

1=1 



\ 1_ 

\ n i N iJ 



2 n2 ci 2 

Pi ^Oli'^li 



k 

z 

i=l 


n n 

[h, N,J 


Pi RouPouSoiSu 


k 

z 

Z=1 


l f h N t J 


2 n2 ri 2 

Pi RwS u 



+ 



z 



\ n i 




{^a 0 ) 2 Rl u Sl 



20(^0 KiPouSoiSu } 



L -1 



i=l 



n ; 



z 

i=l 



1 _ 1 



pfKiSu 





1 

2 



(3.14) 

The above equation provides the value of a on which one can obtain the precision 
to a desired level. Sometimes the roots given by the above equation may be imaginary. 
So, in order that the roots are real, the conditions on the value of a Q are given by 
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z 

M>~ 



1 _ 1 

\ n i N iJ 



Pi RouPouSoiSu 



z 

1=1 



l 1_ 

K n i N iJ 



- + 



2 r) 2 ci 2 

Pi R m S u 



k J _1 
L i 1 



2, 



=i n , 



-> 2 (w' s ' 2 -W s 2 ) 

i y v /2 J 0/2 VV il° Oil) 



k ( 1 1 A 



I 

i=l 



v"/ ^y 



2 n2 r~t 2 

Pi R ouS u 



(3.15) 



and 0(a o )< 



Z 

1=1 



J 1_ 

^y 



Pi R 0li p 0 uS 0i s u 



z 

i=l 



J 1_ 

V«/ ^y 



p; R LSu 



Z — P?{KSoa-W i2 S 2 0 n) 



i = 1 



* ^ 1 1 A 



Z 

i=l 



(3.16) 



v”.- ^-y 



Pf R Oli R li 



4. Empirical Study 

In this section, therefore, we have illustrated the results, derived above, on the 
basis of some empirical data. For this purpose, a data set has been taken into 
consideration. Here the population is MU284 population available in Samdal et. al. 
(1992, page 652, Appendix B). We have considered the population in the year 1985 as 
study variable and that in the year 1975 as auxiliary variable. There are 284 
municipalities which have been divided randomly in to four strata having sizes 73, 70, 97 
and 44. 

Table 1 shows the values of the parameters of the population under consideration 
for the four strata which are needed in computational procedure. 



Table 1: Parameters of the Population 



Stratum 

(0 


Stratum 

Size 

(V) 


Mean 

(Vo,) 


Mean 

(v„) 


fed 


te) 


S 0 , 


S u 


Poii 


te) 


1 


73 


40.85 


39.56 


6369.0999 


6624.4398 


79.8066 


81.3907 


0.999 


618.8844 


2 


70 


27.83 


27.57 


1051.0725 


1147.0111 


32.4202 


33.8676 


0.998 


240.9050 


3 


97 


25.79 


25.44 


2014.9651 


2205.4021 


44.8884 


46.9617 


0.999 


265.5220 


4 


44 


20.64 


20.36 


538.4749 


485.2655 


23.2051 


22.0287 


0.997 


83.6944 



34 





The value of R 0l = Jo / Xi comes out to be 1 .0 1 92 . 

We fix the sample size to be 60. Then the allocation of samples to different strata 
under proportional and Neyman allocations are shown in the following table 

Table 2: Allocation of Sample 



Stratum 

(0 


Size of Samples under 


Proportional Allocation 


Neyman Allocation 


1 


15 


26 


2 


15 


10 


3 


21 


19 


4 


9 


5 



On the basis of the equation (3.10), we obtained the optimum values of cx : 

Under Proportional Allocation 

0{a)= 0.9491, a opt = (31.9975, 2.6128, 1.12) and 

Under Neyman Allocation 

</>{a)= 0.9527, ^=(34.1435,2.6114, 1.1123). 

The following table depicts the values of the MSE’s of the estimators T FS {a ) for 
a? , a = 1 and 4 under proportional and Neyman allocations. A comparison of MSE of 
r ra (er)with a t and a = 1 with that at a = 4 reveals the fact that the utilization of 
auxiliary information at the estimation stage certainly improves the efficiency of the 
estimator as compared to the usual mean estimator T 0 * st . 
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Table 3: MSE Comparison (L i = 2, W i2 =1 0% for all i ) 



MSE 


Allocation 


Proportional 


Neyman 


M[T FS {a)\ op. 


0.6264 


0.6015 


M[r„(i)] 


0.7270 


0.6705 




35.6069 


28.6080 



We shall now illustrate how by an appropriate choice of (X , the MSE of the 
estimators T FS (a) can be reduced to a desired level even if the non-response rate is 
increased. 

Let us take L t =2, W i2 =0.1, W a =0.3 and S 0i2 = (s 2 . 2 ) for all i 

Under Proportional Allocation 

From the condition (3.15) and (3.16), we have conditions for real roots of tp(a ] ) 
as 

</>(a 0 ) > 1.1527 and (p{a 0 ) < 0.7454. 

Therefore, if we take ) = 1.20, then for this choice of ^(ct 0 ), we get 

M[T FS {a\W i2 ] = 3.0712 and M\r FS {a)\W' i2 \ = 4.6818. 

Thus, there is about 52 percent increase in the MSE of the estimator if non- 
response rate is tripled. Now using (3.14), we get ^(« 1 )=1.0957 and 0.8025. At this value 

of M), m[t fs { a) \ reduces to 3.0712 even if non-response rate is 30 percent. Thus a 
possible choice of cc may be made in order to reduce the MSE to a desired level. 
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Under Neyman Allocation 

Conditions for real roots of (j)[a { ) 

</>{a Q ) > 1.1746 and (p{a 0 ) < 0.7309. 

If <p{oc Q ) = 1 .20 then we have 
M[T FS {a)\W i2 ] = 2.4885 and m\t FS { a)\W n \ = 4.0072. 

Further, we get from (3.14), ^(« 1 )=1.0620 and 0.8435, so that 
M\r FS (a)\W n \=2AM5 for ^(a 1 )=1.0620. 

5. Conclusion 

We have suggested a general family of factor-type estimators for estimating the 
population mean in stratified random sampling under non-response using an auxiliary 
variable. The optimum property of the family has been discussed. It has also been 
discussed about the choice of appropriate estimator of the family in order to get a desired 
level of accuracy even if non-response is high. The Table 3 reveals that the optimum 
estimator of the suggested family has greater precision than separate ratio and sample 
mean estimators. Besides it, the reduction of MSE of the estimators T FS (a) to a desired 

extent by an appropriate choice of the parameter a even if the non-response rate is high 
in the population, has also been illustrated. 
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Abstract 

This chapter presents the detailed discussion on the effect of non-response on the 
estimator of population mean in a frequently used design, namely, stratified random 
sampling. In this chapter, our aim is to discuss the existing allocation schemes in 
presence of non-response and to suggest some new allocation schemes utilizing the 
knowledge of response and non-response rates of different strata. The effects of proposed 
schemes on the sampling variance of the estimator have been discussed and compared 
with the usual allocation schemes, namely, proportional allocation and Neyman 
allocation in presence of non-response. The empirical study has also been carried out in 
support of the results. 

Keywords: Stratified random sampling, Allocation schemes, Non-response, Mean 
squares, Empirical Study. 

1. Introduction 

Sukhatme (1935) has shown that by effectively using the optimum allocation in 
stratified sampling, estimates of the strata variances obtained in a previous survey or in a 
specially planned pilot survey based even on samples of moderate sample size would be 
adequate for increasing the precision of the estimator. Evans (1951) has also considered 
the problem of allocation based on estimates of strata variances obtained in earlier 
survey. According to literature of sampling theory, various efforts have been made to 
reduce the error which arises because of taking a part of the population, i.e., sampling 
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error. Besides the sampling error there are also several non-sampling errors which take 
place from time to time due to a number of factors such as faulty method of selection and 
estimation, incomplete coverage, difference in interviewers, lack of proper supervision, 
etc. Incompleteness or non-response in the form of absence, censoring or grouping is a 
troubling issue of many data sets. 

In choosing the sample sizes from the different strata in stratified random 
sampling one can select it in such a way that it is either exclusively proportional to the 
strata sizes or proportional to strata sizes along with the variation in the strata under 
proportional allocation or Neyman allocation respectively. If non-response is inherent in 
the entire population and so are in all the strata, obviously it would be quite impossible to 
adopt Neyman allocation because then the knowledge of stratum variability will not be 
available, rather the knowledge of response rate of different strata might be easily 
available or might be easily estimated from the sample selected from each stratum. Thus, 
it is quite reasonable to utilize the response rate (or non-response rate) while allocating 
samples to stratum instead of Neyman allocation in presence of non-response error. 

In the present chapter, we have proposed some new allocation schemes in 
selecting the samples from different strata based on response (non-response) rates of the 
strata in presence of non-response. We have compared them with Neyman and 
proportional allocations. The results have been shown with a numerical example. 

2. Sampling Strategy and Estimation Procedure 

In the study of non-response, according to one deterministic response model, it is 
generally assumed that the population is dichotomized in two strata; a response stratum 
considering of all units for which measurements would be obtained if the units happened 
to fall in the sample and a non-response stratum of units for which no measurement 
would be obtained. However, this division into two strata is, of course, an 
oversimplification of the problem. The theory involved in HH technique, is as given 
below: 

Let us consider a sample of size n is drawn from a finite population of size N . 
Let n x units in the sample responded and n 2 units did not respond, so that n, + n 2 = n . 
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The n t units may be regarded as a sample from the response class and n 2 units as a 
sample from the non-response class belonging to the population. Let us assume that N 1 
and N 2 be the number of units in the response stratum and non-response stratum 
respectively in the population. Obviously, /V, and N 2 are not known but their unbiased 
estimates can be obtained from the sample as 

N x = n x N / n ; N 2 =n 2 N/n. 

Let m be the size of the sub-sample from n 2 non-respondents to be interviewed. 

Hansen and Hurwitz (1946) proposed an estimator to estimate the population mean Jo 
of the study variable X 0 as 



r HjXoi + n 2 X0n 

* 0 HH — 



(2.1) 



which is unbiased for Jo, whereas xoi and xo m are sample means based on samples of 
sizes n x and m respectively for the study variable X 0 . 



The variance of T 0HH is given by 



V(T ohh ) = 



1 _ 1 

n N 



Sl + 



L - 1 



W S 

rr 2*J 02 , 



(2.2) 



n 2 TTT _ N 2 0 2 j o 2 



where L = —, W 2 = — - , S 0 and S 02 are the mean squares of entire group and non- 
m N 



response group respectively in the population. 



Let us consider a population consisting of N units divided into k strata. Let the 
size of i th stratum is N t , ( i = 1,2,..., k ) and we decide to select a sample of size n from the 
entire population in such a way that n i units are selected from the i ,h stratum. Thus, we 

k 

have ^ = n . 

1=1 
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Let the non-response occurs in each stratum. Then using Hansen and Hurwitz 
procedure we select a sample of size m i units out of n i2 non-respondent units in the i‘ h 
stratum with the help of simple random sampling without replacement (SRSWOR) such 
that n j2 = L i m i , L j > 1 and the information are observed on all the m i units by interview 
method. 

The Hansen-Hurwitz estimator of population mean Xo ,• for the i th stratum will be 




H^XOH Tl^X Omi 
n i 






(2.3) 



where xon and xo mi are the sample means based on n n respondent units and m i non- 
respondent units respectively in the i‘ h stratum. 

Obviously T* }i is an unbiased estimator of X 0/ . Combining the estimators over all 
strata we get the estimator of population mean X 0 , given by 

TL (2.4) 

/= 1 

where p, - — . 

N 

Obviously, we have 

EtCJ=X 0 . (2.5) 



The variance of T* )st is given by 



dCb Z 



i = 1 



1 1 



A 



K n i N iJ 



2 O 2 , (A lL _ 2 a2 



W S U + Z 

,-=i n, 



-W aPi So i2 



(2.6) 
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where W i2 = , S 0i and S M2 are the mean squares of entire group and non-response 

group respectively in the i th stratum. 

It is easy to see that under ‘proportional allocation’ (PA), that is, when n i = np i 
for i = 1,2,..., k, V\r* st \ is obtained as 




Nj 



pA i 2 +-£(l,-iw»pA, 



n i= 



i = 1 



(2.7) 



Yl S 

whereas under the ‘Neyman allocation’ (NA), with/?,. =— — ' °' (i = l,2,...,k), it is 

tp'S* 

i=i 

equal to 



v[t;„ 



( k 



st IN A 



\ 2 



±P t s»\ + : -WnPi 

y tv ,- =1 



V i=i 



v i=1 



-’ 0)2 
^ 0 ; J 



( k 



YjP< s o 

V i=i 



(2.8) 

It is important to mention here that the last terms in the expressions (2.7) and (2.8) 
arise due to non-response in the population. Further, in presence of non-response in the 
population, Neyman allocation may or may not be efficient than the proportional 
allocation, a situation which is quite contrary to the usual case when population is free 
from non-response. This can be understood from the following: 

We have 



del, -dcL = -2>,(e -s.) +-£,(l, -i ) P ,w n _s. 



2 

0/2 



1 -- 



i=l 



*0 / J 




i = 1 



(2.9) 
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Whole the first term in the above expression is necessarily positive, the second 
tenn may be negative and greater than the first term in magnitude depending upon the 



sign and magnitude of the term 



( o A 

j O w 

V S 0i J 



for all i . Thus, in presence of non-response in 



the stratified population, Neyman allocation does not always guarantee a better result as it 
is case when the population is free from non-response error. 



3. Some New Allocation Schemes 



It is a well known fact that in case the stratified population does not have non- 
response error and strata mean squares, (i = 1,2,...,/:), are known, it is always 

advisable to prefer Neyman allocation scheme as compared to proportional allocation 
scheme in order to increase the precision of the estimator. But, if the population is 
affected by non-response, Neyman allocation is not always a better proposition. This has 
been highlighted under the section 2 above. Moreover, in case non-response is present in 
strata, knowledge on strata mean squares, , are impossible to collect, rather direct 

estimates of S^ n and Sf n2 may be had from the sample. Under these circumstances, it is, 

therefore, practically difficult to adopt Neyman allocation if non-response is inherent in 
the population. However, proportional allocation does not demand the knowledge of 
strata mean squares and rests only upon the strata sizes, hence it is well applicable even 
in the presence of non-response. 

As discussed in the section 2, unbiased estimates of response and non-response 
rates in the population are readily available and hence it seems quite reasonable to thi nk 
for developing allocation schemes which involve the knowledge of population response 
(non-response) rates in each stratum. If such allocation schemes yield precised estimates 
as compared to proportional allocation, these would be advisable to adopt instead of 
Neyman allocation due to the reasons mentioned above. 

In this section, we have, therefore, proposed some new allocation schemes which 
utilize the knowledge of response (non-response) rates in subpopulations. While some of 
the proposed schemes do not utilize the knowledge of , some others are proposed 
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based on the knowledge of just in order to make a comparison of them with Neyman 
allocation under the presence of non-response. In addition to the assumptions of 
proportional and Neyman allocations, we have further assume it logical to allocate larger 
sample from a stratum having larger number of respondents and vice-versa when 
proposing the new schemes of allocations. 

Scheme- 1[0 A (1)]: 

Let us assume that larger size sample is selected from a larger size stratum and 
with larger response rate, that is, 

niOcp.W n for i = 1 , 2 ,..., k . 

Then we have 

n i = Kp : W n where if is a constant. 

The value of K will be 



K - 






Thus we have 



n Pi W i i 



l k 



TjPi W n 



i = 1 



(3.1) 



Putting this value of n,. in expression (2.6), we get 



del =- 






1=1 



I- 

z= 1 



p,s l , (A -i) 



■ + 



W a w n 



W i2 PA 



2 

0/2 



i k 

--TpA 



(3.2) 
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Scheme-2[OA (2)]: 



Let us assume that 



00 Pi W n S oi ■ 



Then, we have 



n Pi W n S 0i 



i k 



Zp. W * S o, 



(3.3) 



and hence the expression (2.6) becomes 

a4) 

Scheme-3[OA (3)]: 

Let us select larger size sample from a larger size stratum but smaller size sample 
if the non-response rate is high. That is, 



vV;l=- 



2 y,w n s. 



z- 

1=1 



P,s , (L'-lWnP, S, 

w n w n s 



Pi 



1 w, 



n , oc 



i2 



Then 



n, - 



"Pi 



W n ±*~ 

1=1 rv il 



(3.5) 



and the expression of f[t o * J reduces to 



vk,X 



z 



M 



+( 4 - -\MpAX 



/=1 






(3.6) 
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Scheme-4[OA (4)]: 



Let 



n, oc 



Pi S o 



W, 



then 



i 2 



nPiS o 



r yZA 
»F a 



(3.7) 



The corresponding expression of l[t o *J is 




y P/go 

tr ^ 



i2 






S 2 



0/2 



/=! 






(3-8) 



Scheme-5[OA (5)]: 



Let 



n , oc 



P,W n 

W„ 



then 



"Pi w n 



W,± p ‘ w " 



' W 
«=1 rr /2 



(3.9) 



The expression (2.6) gives 




f pEa 

tr ^ 



i2 



z- 

Z=1 






H', 



■ + 









(3.10) 
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Scheme-6[OA (6)]: 



If 



n, °c 



P.WnSp, 

W„ 



then, we have 



nPjWnSy 
w y P, w n s oi 

2 rr ^ 



In this case, 



vVL\ becomes 



(3.11) 




1 




w n s Q r 


Sp f Pi^n^oi . (A 


-1 MpA] 


n 


_ ,=i 


w n 


M IF. 


w n s 0i j 



1 

N 




(3.12) 

Remark 1: It is to be mentioned here that if response rate assumes same value in all the 
strata, that is W n = W (say), then schemes 1, 3 and 5 reduces to ‘proportional allocation’, 
while the schemes 2, 4 and 6 reduces to ‘Neyman allocation’. The corresponding 
expressions, vVL {r = 1,3,5) are then similar to f'[CL and V VL l * (r = 2,4,6) 
reduce to v\r* st \ NA . 

Remark 2: Although the theoretical comparison of expressions of F[jT 0 *J r , (r - 1,3,5) 

and v\r* sJ { , {r = 2,4,6) with v\r* st \ PA and v\r* st J V4 respectively is required in order to 

understand the suitability of the proposed schemes, but such comparisons do not yield 
explicit solutions in general. The suitability of a scheme does depend upon the parametric 
values of the population. We have, therefore, illustrated the results with the help of some 
empirical data. 



49 




4. Empirical Study 



In order to investigate the efficiency of the estimator T 0 *, under proposed 

allocation schemes, based on response (non-response) rates, we have considered here an 
empirical data set. 

We have taken the data available in Sarndal et. al. (1992) given in Appendix B. 
The data refer to 284 municipalities in Sweden, varying considerably in size and other 
characteristics. The population consisting of the 284 municipalities is referred to as the 
MU284 population. 

For the purpose of illustration, we have randomly divided the 284 municipalities 
into four strata consisting of 73, 70, 97 and 44 municipalities. The 1985 population (in 
thousands) has been considered as the study variable, X 0 . 

On the basis of the data, the following values of parameters were obtained: 

Table 1 : Particulars of the Data 



(N = 284) 



Stratum 

(0 


Size 

(V) 


Stratum Mean 

(V.,) 


Stratum Mean 
Square 

(si) 


Mean Square of 
the Non- 
response Group 

(c 2 )- c 2 


1 


73 


40.85 


6369.10 


5095.28 


2 


70 


27.83 


1051.07 


840.86 


3 


97 


25.78 


2014.97 


1611.97 


4 


44 


20.64 


538.47 


430.78 



We have taken sample size, n= 60. 
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Tables 2 depicts the values of sample sizes, n ; (i = 1,2, 3, 4) and values of f[jT 0 * J 
under PA, NA and proposed schemes OA(l) to OA(6) for different selections of the 
values of L t and W j2 (i = l,2,3,4). 
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Non- 

response 

Rate 

(w i2 ) 

(Percent) 



Table 2 

Sample Sizes and Variance of T* st under Different Allocation Schemes 
(Z =2.0, 2.5, 1.5, 3.5 for i= 1, 2, 3, 4 respectively) 

Sample Size {n i ) and v[tL I under 



OA(l) 



OA(2) 



OA(3) 



OA(4) 



OA(5) 



rtC] 


", 


v[tI,\ 


", 


vVi,\ 


", 


vVi \ 


", 


v[rl\ 


", 


v\r„„\ 


", 


36.04 


17 


41.02 


28 


116.59 


20 


38.43 


31 


38.43 


22 


37.85 


33 




15 




10 




15 




10 




15 




10 




20 




18 




18 




16 




17 




14 




8 




4 




7 




3 




6 




3 


37.27 


14 


49.17 


24 


117.37 


12 


55.41 


21 


39.07 


10 


60.76 


19 




14 




10 




13 




10 




13 




10 




21 




21 

5 




22 




22 




23 




24 




10 








13 










7 
















7 




14 






36.30 


16 


43.40 


27 


116.54 


16 


44.15 


27 


37.76 


16 


44.69 


27 




16 




11 




19 




13 




21 




14 




20 






18 

n 






17 




16 




8 




18 






17 




6 




3 



35.99 17 41.32 28 115.40 20 39.45 32 38.82 



22 39.73 

16 
14 




5. Concluding Remarks 

In the present chapter, our aim was to accommodate the non-response error 
inherent in the stratified population during the estimation procedure and hence to suggest 
some new allocation schemes which utilize the knowledge of response (non-response) 
rates of strata. As discussed in different sub-sections, Neyman allocation may sometimes 
produce less precised estimates of population mean in comparison to proportional 
allocation if non-response is present in the population. Moreover, Neyman allocation is 
sometimes impractical in such situation, since then neither the knowledge olkS , 0( 
(/ =1,2,3,4), the mean squares of the strata, will be available, nor these could be estimated 
easily from the sample. In contrast to this, what might be easily known or could be 
estimated from the sample are response (non-response) rates of different strata. It was, 
therefore, thought to propose some new allocation schemes depending upon response 
(non-response) rates. 

A look of Table 2 reveals that in most of the situations (under different 
combinations of W i2 and L ), allocation schemes OA (1), OA (3) and OA (5), depending 

solely upon the knowledge of p f and W i2 (orW n ), produce more precised estimates as 
compared to PA. Further, as for as a comparative study of schemes OA (1), OA (3) and 
OA (5) is concerned, no doubt, all these schemes are more or less similar in tenns of their 
efficiency. Thus, in addition to the knowledge of strata sizes, p n the knowledge of 

response (non-response) rates, W n (or W i2 ), while allocating sample to different strata; 
certainly adds to the precision of the estimate. 

It is also evident from the table that the additional infonnation on the mean 
squares of strata certainly adds to the precision of the estimate, but this contribution is not 
very much significant in comparison to NA. Scheme OA (2) is throughout worse than 
any other scheme. 
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Abstract 

In this chapter, we have suggested an improved estimator for estimating the 
population mean in stratified sampling in presence of auxiliary information. The mean 
square error (MSE) of the proposed estimator have been derived under large sample 
approximation. Besides, considering the minimum case of the MSE equation, the 
efficient conditions between the proposed and existing estimators are obtained. These 
theoretical findings are supported by a numerical example. 

Keywords : Auxiliary variable, mean square errors; exponential ratio type Estimates; 
stratified random sampling. 

1. Introduction 

In planning surveys, stratified sampling has often proved useful in improving the 
precision of other unstratified sampling strategies to estimate the finite population mean 

Y = (a.i£5iy>>i)/N 

Consider a finite population of size N. Let y and x respectively, be the study and 
auxiliary variates on each unit Uj (j=l,2,3...N) of the population U. Let the population be 
divided in to L strata with the h th stratum containing N h units, h=l,2,3...,L so that 
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Xh=i N h = N. Suppose that a simple random sample of size n h is drawn without 
replacement (SRSWOR) from the h th stratum such that Zh =1 n h — n - 

When the population mean X of the auxiliary variable x is known, Hansen et. al. 
(1946) suggested a “combined ratio estimator “ 

ycR = y s t(r4 (1.1) 

A st 

where, y st = Zh =1 w hyh, x st = £h=i w h x h 

yh = r- Z£i yhi an d x h = -J- Zf= x x hi 

n h n h 

w h = y and X = Zh =:L w h X h . 

The “combined product estimator “ for Y is defined by 

ycp=y s t(Y) C 1 - 2 ) 



To the first degree of approximation, the mean square error (MSE) of y CR and y CP are 
respectively given by - 

MSE(y CR ) = Zk wgeJSjb + R 2 S 2 h - 2RS yxh ] (1.3) 

MSE(y CP ) = Zh=! w^0 h [S 2 h + R 2 S xh + 2RS yxh ] (1.4) 

where 0 h = ( — — -J- ) , R = = is the population ratio, S 2 h is the population variance of 

variate of interest in stratum h, S 2 h is the population variance of auxiliary variate in 
stratum h and S yxh is the population covariance between auxiliary variate and variate of 
interest in stratum h. 

Following Bahl and Tuteja (1991), Singh et. al. (2009) proposed following 
estimator in stratified random sampling - 



= Xstexp [g 



(1.5) 
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The MSE of y er , to the first degree of approximation is given by 

MSE(y er ) = w^0 h [s 2 h + ^S 2 h - RS yxh ] (1.6) 

Using the estimator y CR and y CP , Singh and Vishwakarma (2005) suggested the 
combined ratio-product estimator for estimating Y as 

Yrpc =yst[a^+(l-a)^] (1.7) 



1 

For minimum value of a = -(1 + C*) = a 0 (say), the minimum MSE of the estimator 
Yrpc is given by 

(1.8) 



MSE (y RP c) = EfiXe h (l - p* 2 ) S 2 h 



where C* = 



cov(yst.x s t) 

RV(x st ) ’ 



p = 



cov(y s t.Xst) 

RVv(y st )V(x st )’ 



R = = . 

X 



2. Proposed estimator 

Following Singh and Vishwakarma (2005), we propose a new family of 
estimators - 



t = X 



warn/ 



+ (1 - » p s ,exp [gg (f) J (2.1) 

where A is real constant to be determined such that the MSE of t is a minimum and a, P 
are real constants such that p =1- a. 

Remark 2.1: For A — 1 and a — 1 the estimator t tends to Singh et. al. (2009) estimator. 
For A — 1 and a — 0 the estimator t takes the form of Hansen et. al. (1946) estimator y CR . 
For A — 0 and a = 1 the estimator t tends to Singh et. al. (2009) estimator. For A — l and 
a = 0 the estimator t takes the form of the estimator y CP . 

To obtain the MSE of t to the first degree of approximation, we write 
Yst = Sh=i w h y h = Y(1 + e 0 ) and 
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Xst = Zh=iW h x h = X(1 + ej 
Such that, 

E(e 0 ) = E(e„) = 0. 

Under SRSWOR, we have 
E( e o) =pSi=iwg0 h Sj h 
E( e i) =^Zi J = 1 wg0 h Srf 1 

E(e 0 ei) =4^i J = 1 Wh0 h S yxh 

Expressing equation (2.1) in terms of e’s we have 

t = Y(l + e 0 ) >.{exp (~(l + y) )} {(' + Ci)- 1 } 1 -^ 

(1 - X) {exp (| (l + ^) _1 )| “ (1 + e,)' 1 -* (2.2) 

We now assume that | e^l so that we may expand (1 + ei) -1 as a series in powers of 
e x . Expanding the right hand side of (2.2) to the first order of approximation, we obtain 

(t-Y) = Yfeo + eiCl + aX-^-ZX)] (2.3) 

Squaring both sides of (2.3) and then taking expectations, we get the MSE of the 
estimator t, to the first order of approximation, as 

MSE(t) = V(y st ) + R 2 (l - 2k)S 2 h {(l - 2k)A 2 + 2C*A} (2.4) 

where A = (l — 

Minimisation of (2.4) with respect to A yields its optimum values as 

w=K 1+ x)=^( say ) (2 - 5) 

Putting X — X 0 in (2.4) we get the minimum MSE of the estimator t as - 



58 




min MSE (t) = V(y st ) (1 - p* 2 ) 

= £h=iwjje h (l - p* 2 )S 2 h . (2.6) 

3. Efficiency comparisons 

In this section we have compared proposed estimator with different already 
proposed estimators, obtained the conditions under which our proposed estimator 
performs better than other estimators. 

First we have compared proposed estimator with simple mean in stratified random 
sampling. 

MSE(t) < MSE(y st ), if 

V(y st ) + R 2 (l - 2^)S 2 h {(l - 2^)A 2 + 2C*A) < V(y st ) 

. f i i . c*\ , . (i i . c*\ 

mm — <A< max — 

V2 2 4 / \2 2 4 / 

Next we compare proposed estimator with combined ratio estimator - 
MSE(t) < MSE(y CR ), if 

V(y st ) + Zh=r w^0 h R 2 (l - 2^)S 2 h {(l - 2^)A 2 + 2C*A) < 

S[l 1 w 2 0 h [S 2 h + R 2 S 2 h -2RS yxh ] 
or, if (1 - 2C*) - (1 - 2A,)((1 - 2X)k 2 + 2C*A) > 0 

1 fA+l) . , . 1 f2C*+A— 1) 

or - lf it— ) s x s 2{—r- 1 ■ 

Next we compare efficiency of proposed estimator with product estimator 
MSE(t) < MSE(y PR ), if 

V(y st ) + Sfl! w 2 0 h R 2 (l - 2^)S 2 h {(l - 2^)A 2 + 2C*A) < 

S|i 1 w 2 0 h [S 2 h + R 2 S 2 h + 2RS yxh ] 
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or, if (1 + 2 C*) - (1 - 2X)((1 - 2k)A 2 + 2C*A) > 0 



or , lf 



+A+1| 
A J ' 



Next we compare efficiency of proposed estimator and exponential ratio estimator in 



stratified sampling 
MSE(t) < MSE(y ER ), if 

V(y st ) + EL w£0 h R 2 (l - 2^)SL((1 - 2k)A 2 + 2C*A} < 

Zh=i w hO h [Sy h + y S 2 h - RS yxh ] 
or, if (1 - 4C*) - 4(1 - 2k)((l - 2k)A 2 + 2C*A) > 0 



or, if if—) < A, < lf 4C * +2A 

2 1 2A J 2 l 2A J 

Finally we compare efficiency of proposed estimator with exponential product estimator 
in stratified random sampling 



MSE(t) < MSE(y EP ), if 

or, if V(y st ) + Zh=! w h0 h R 2 (l - 2k)S xh {(l - 2k)A 2 + 2C*A} < 

EiLl W h9h [Sy h + ^7 S xh + RSyxh] 

or, if (1 + 4C*) - 4(1 - 2k)((l - 2k)A 2 + 2C*A) > 0 



or, if < k<-f— 

’ 2 L 2A J 21 



+2A+1') 
2A J 



Whenever above conditions are satisfied the proposed estimator performs better than 
other mentioned estimators. 

4. Numerical illustration 

All the theoretical results are supported by using the data given in Singh and 
Vishwakarma (2005). 

Data statistics: 
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Stratum 


w h 


0h 


c2 

**xh 


c2 

yh 


Syxh 














2 


0.5227 


0.12454 


132.66 


259113.71 


5709.16 


3 


0.2428 


0.08902 


38.44 


65885.60 


1404.71 



R=49.03 and A opt = 0.9422(a = 0) and 1.384525 (a = 1) 

Using the above data percentage relative efficiencies of different 
estimators y CR , y CP , y ER , yEP and proposed estimator t w.r.t y st have been calculated. 

Table 4.1: PRE of different estimators of Y 



Estimator 


Yst 


ycr 


Ycp 


Yer 


Yep 


YHPS(opt) 


YpRP(opt) 


PRE 


100 


1148.256 


23.326 


405.222 


42.612 


1403.317 


1403.317 



We have also shown the range of /„ for which proposed estimator performs better 
thany st . 

Table 4.2: Range of A for which proposed estimator performs better than y st 



Value of constant a 


Fonn of proposed estimator 


Range of A 


a — 0 


Yhps 


(0.5, 1.3) 


a = 1 


Ycer 


(0.5, 2.2) 



5. Conclusion 

From the theoretical discussion and empirical study we conclude that the 
proposed estimator under optimum conditions performs better than other estimators 
considered in the article. The relative efficiency of various estimators are listed in Table 
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4. 1 and the range of A for which proposed estimator performs better than y st is written in 



Table 4.2. 
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Abstract 

This chapter proposes some estimators for the population variance of the variable 
under study, which make use of infonnation regarding the population proportion 
possessing certain attribute. Under simple random sampling without replacement 
(SRSWOR) scheme, the mean squared error (MSE) up to the first order of approximation 
is derived. The results have been illustrated numerically by taking some empirical 
population considered in the literature. 

Keywords: Auxiliary attribute, exponential ratio-type estimates, simple random 
sampling, mean square error, efficiency. 

1. Introduction 

It is well known that the auxiliary information in the theory of sampling is used to 
increase the efficiency of estimator of population parameters. Out of many ratio, 
regression and product methods of estimation are good examples in this context. There 
exist situations when information is available in the form of attribute which is highly 
correlated with y. Taking into consideration the point biserial correlation coefficient 
between auxiliary attribute and study variable, several authors including Naik and 
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Gupta (1996), Jhajj et. al. (2006), Shabbir and Gupta (2007), Singh et. al. (2007, 2008) 
and Abd-Elfattah et. al. (2010) defined ratio estimators of population mean when the 
prior infonnation of population proportion of units, possessing the same attribute is 
available. 

2 

In many situations, the problem of estimating the population variance a of study 
variable y assumes importance. When the prior information on parameters of auxiliary 
variable(s) is available, Das and Tripathi (1978), Isaki (1983), Prasad and Singh (1990), 
Kadilar and Cingi (2006, 2007) and Singh et. al. (2007) have suggested various 

9 

estimators of Sy. 

In this chapter we have proposed family of estimators for the population variance 

2 

Sy when one of the variables is in the fonn of attribute. For main results we confine 
ourselves to sampling scheme SRSWOR ignoring the finite population correction. 

2. The proposed estimators and their properties 

Following Isaki (1983), we propose a ratio estimator 

c2 

2 

‘i= s y-r 

S<|> 

Next we propose regression estimator for the population variance 

t 2 = Sy +b(s^ -s|) 

And following Singh et. al. (2009), we propose another estimator 



(2.1) 



(2.2) 
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(2.3) 



2 2 2 2 

where s y and s ( |, are unbiased estimator of population variances S y and 

respectively and b is a constant, which makes the MSE of the estimator minimum. 

To obtain the bias and MSE, we write- 
s' Sy (l + e 0 )> s^ = S^(l+ ei) 

Such that E(eo) = E(e^ ) = 0 

andE (e 0 2 )=^2). E (e?)=^^. E (e 0 e 1 )=felzl) j 

n n n 

^ Zlyi-Yffe-p)’ 

where8pq= (W]’ (-,) ' 

P 2 (»)=rf = 5„and|3 !( , | =H|l = 5 M 

r^02 M^02 

Let p2(y) = P 2 (y) “ P^) = P 2 (x) “ and § pq = 5 pq “ 1 
P is the proportions of units in the population. 

Now the estimator t^ defined in (2.1) can be written as 

(ti _s “) = Sy(e 0 -ej +ei -e^) (2.4) 



Similarly, the estimator t 2 can be written as 
(t 2 -S 2 )=S 2 e 0 -bS|e, 

And the estimator t 3 can be written as 




e i e o e i + 3 e t 

2 2 8 y 



The MSE oftj ,t 3 and variance of t 2 are given, respectively, as 



(2.5) 

(2.6) 
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(2.7) 



MSE(t pl ) = ^ [p; (y) + P* 2(w - 26' 2 ] 

ft* , Pzfo) £* 
P2(y)+^— -°22 



MSE(t p3 ) = 



y 

n 



(2.8) 



The variance of t p2 is given as 

v(t 2 ) = i [sy (^40 - 1 ) + b 2 S l fro4 - 0- 2bS^ S l (l 22 - 1)] (2.9) 

On differentiating (2.9) with respect to b and equating to zero we obtain 

b s-;(s,,-i) 
s;(s„ -l) 

Substituting the optimum value of b in (2.9), we get the minimum variance of the 
estimator t 2 , as 



(2.10) 



min.V(t 2 )=— P 



n 



2(y) 



l ^22 






= Var(s 2 )(l 



■P( 






( 2 . 11 ) 



3. Adapted estimator 

We adapt the Shabbir and Gupta (2007) and Grover (2010) estimator, to the case when 
one of the variables is in the fonn of attribute and propose the estimator t 4 

(3.1) 



k l s y " l “ ^2 (s<j) -S^) 


exp 


( c 2 

S(|) 


n 2 2 






v s 9 + s 9 J 



where k 1 and k 2 are suitably chosen constants. 

Expressing equation (3.1) in terms of e’s and retaining only terms up to second degree of 
e’s, we have: 



t 4 - k 1 Sy(l + e 0 )-k 2 s^e 1 



, e i 3 2 
1 - — + -ef 
2 8 



(3.2) 
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Up to first order of approximation, the mean square error of t 4 is 



MSE(t 4 )=E(t 4 -S$f 



= s;[(k, -i) 1 + tt?fo(y)+p;M- 2 Sj+U 1 [ 5;, -h ] (♦) 

V 4 



+s;k^;(*)+2xs;s 



2 

y^x 



kik 2 (p;(x)-5; 2 )-^p;( x ) 



(3.3) 



where, A - — 
n 

On partially differentiating (3.3) with respect to k, (i =1,2) we get optimum values ofk] 



and k 2 respectively as 

A „* 



f 



p 2 (* 



2-^P 2 (t>) 

V 4 



>(p* 2 (rt^A+i)- 



AB" 



and 



P*2 (4>)(XA + 1) - AB 2 - Bil - - P* 2 W 
. V 4 J 

2Sx (pf (4 >)(aa + i) - ab 2 



(3- 4 ) 



(3.5) 



where, 

a = p; ( y ) + p; w - 2s; 2 and b = p; w - 5 ; . 

On substituting these optimum values of kj and k 2 in (3.3), we get the minimum value 
of MSE of t 4 as 



A|3 2 (x) 



MSE(t 4 ) 



MSE(t 2 ) 
MSE(t 2 ) 



MSE(t 2 ) + 



ASy(3 2 (c|)) 



16 



1 + 



1 , MSE(t 2 ) 



(3.6) 
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4. Efficiency Comparison 

First we have compared the efficiency of proposed estimator under optimum condition 
with the usual estimator as - 



v(s 2 )-mse(s 2 ) = 

v y / v p /opt p 



xs 4 y 5; 2 2 



MSE(t 2 ) 



2(x 



1 + 






MSE(t 2 ) 



MSE(t 2 ) + 






16 



+ ■ 



> 



1 , MSE(t 2 ) 



0 always. 



(4.1) 



y J 



Next we have compared the efficiency of proposed estimator under optimum condition 
with the ratio estimator as - 



From (2.1) and (3.6) we have 



MSE(t 2 )-MSE@j) =XS 



“|2 



VP2W _ 



7 22 



# 



2(x 



1 + 



MSE(t 2 ) 
MSE(t 2 ) 



/ ^s 4 r;^) a 

?ip 2 (x) MSE(t 2 ) + y 2 ; 



16 



+ ■ 



J 



\ , MSE(t ; ) l 



> 



0 always. 



(4.2) 



y J 



Next we have compared the efficiency of proposed estimator under optimum condition 
with the exponential ratio estimator as - 
From (2.3) and (3.6) we have 
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MSE(t,)-MSE^) =*S 



“1 2 



# 



2(x 



7 22 



2# 



x&M 



2(x 



MSE(t 2 ) + 



MSE(t 2 ) 
MSE(t 2 ) 



1 + 



16 



+ ■ 



t , MSE(t 2 ) 



> 0 always. 



(4.3) 



y J 



Finally we have compared the efficiency of proposed estimator under optimum condition 
with the Regression estimator as - 



^p2 (x) 



MSE(, 2 )-MSE(t 4 ) = -g|L 



5. Empirical study 



xslp* 2 ($)'' 



MSE(t 2 ) + 



16 



1 , MSE(t 2 ) 



> 0 always. 



’y ; 



(4.4) 



We have used the data given in Sukhatme and Sukhatme (1970), p. 256. 

Where, Y=Number of villages in the circle, and 

<|) Represent a circle consisting more than five villages. 



n 

23 



N 

89 



s ; 

4.074 



S 2 

=> P 

0.110 



^40 

3.811 



^04 

6.162 



^22 

3.996 



The following table shows PRE of different estimator’s w. r. t. to usual estimator. 
Table 1: PRE of different estimators 



Estimators 


to 


tl 


t2 


l 3 


l 4 


PRE 


100 


141.898 


262.187 


254.274 


296.016 



Conclusion 

Superiority of the proposed estimator is established theoretically by the 
universally true conditions derived in Sections 4. Results in Table 1 confirms this 
superiority numerically using the previously used data set. 
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Observed Data 


Trend values 


Single exponential 
smoothing (alpha=0.1) 


Double exponential 
smoothing (alpha-0.9 
and gannna=0.1 ) 


1995 


5.12 


5.3372 




5.12 


1996 


5.75 


5.3036 


5.12 . 


5.707 


1997 


5.26 


5.27 


5.183 


5.32857 


1998 


5.72 


5.2364 


5.1907 


5.698556 


1999 


4.64 


5.2028 


5.24363 


4.765484 


2000 


5.14 


5.1692 


5.183267 


5.110884 


2003 


4.23 


5.1356 


5.17894 


4.329044 


2004 


6.026 


5.102 


5.084046 


5.858346 


2005 


4.46 


5.0684 


5.178242 


4.616965 


2006 


5.52 , 


5.0348 


5.106417 


5.4327 . 


Total 


51.866 


51.86 _j 


46.46824 


51.96755 
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